Improvement of Zone Content Classification by Using Background Analysis

نویسندگان

  • Yalin Wang
  • Robert Haralick
  • Ihsin T. Phillips
چکیده

This paper presents an improved zone content classification method. Motivated by our novel background-analysis-based table identification research, we added two new features to the feature vector from one previously published method [7]. The new features are the total area of large horizontal and large vertical blank blocks and the number of text glyphs in the zone. A binary decision tree is used to assign a zone class on the basis of its feature vector. The training and testing data sets for the algorithm include images drawn from the UWCDROM-III document image database. The classifier is able to classify each given scientific and technical document zone into one of the nine classes, text classes (of font size pt and font size pt), math, table, halftone, map/drawing, ruling, logo, and others. The improved zone classification method raised the accuracy rate to from and reduced the median false alarm rate to from . 1 Problem Statement Let ! be a set of zone entities. Let " be a set of content labels, such as text, table, math, etc. The function #%$&!(')" associates each element of * with a label. The function +,$-!.'0/ specifies measurements made on each element of ! , where / is the measurement space. The zone content classification problem can be formulated as follows: Given a zone set ! and a content label set " , find a classification function #1$2!3'4" , that has the maximum probability:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring Technical Efficiency of the Iranian Post Company Using Data Envelopment Analysis (DEA)

 Given that improvement in efficiency is a major resource to economic development, this can be applied to each domestic sector of an economy. The objective of this paper is to measure technical efficiency of the Iranian Post Company across the country’s provinces using data envelopment analysis (DEA). The ranking of technical efficiency has been calculated by using collected data on post offi...

متن کامل

Determining the Efficiency of Economic Tourism Industry in Chabahar Free Zone by Using Data Envelopment Analysis (DEA) Method

B asically, many developed and developing countries need to expand tourism activities for accessing to the goals of national economic development and attracting foreign funds, therefore evaluating the efficiency of tourism industry can have significant help in recognizing the existed tourism potentials. The Chabahar zone in spite of numerous potential abilities in attracting domestic...

متن کامل

Document Zone Content Classification Using Decision Tree and HMM

A document can be divided into zones on the basis of its content. For example, a zone can be either text or non-text. This paper describes an algorithm to classify each given document zone into one of nine different classes. Foreground and background features are studied. We used an optimized binary decision tree to estimate the maximum zone content class probability in one set while used Viter...

متن کامل

APPLICATION OF THE HYBRID HARMONY SEARCH WITH SUPPORT VECTOR MACHINE FOR IDENTIFICATION AND CALSSIFICATION OF DAMAGED ZONE AROUND UNDERGROUND SPACES

An excavation damage zone (EDZ) can be defined as a rock zone where the rock properties and conditions have been changed due to the processes related to an excavation. This zone affects the behavior of rock mass surrounding the construction that reduces the stability and safety factor and increase probability of failure of the structure. This paper presents an approach to build a model for the ...

متن کامل

The Position of University Classification in Iran and International Arena; A Systematic Review

Background: Providing an appropriate background is essential for necessary changes and innovation in the higher education, the need to review the classification criteria and development of the type of native criteria and local standards which match the cultural and scientific requirements of the country. Investigating the possible types of indicators to adjusted accurately and objectively is im...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000